class: center, middle, inverse, title-slide .title[ # Analytical Paleobiology ] .subtitle[ ## with R ] .author[ ### Gregor H. Mathes ] .institute[ ### University of Bayreuth/ Paleoecology Lectures ] .date[ ### 2021/06/17 (updated: 2022-05-16) ] --- # Overview .pull-left[ <br> **First ...** - clean projects - R-projects - here functionality - Tidy data - import data - tibbles ] -- .pull-right[ <br> **and then** - Tidyverse - wrangle - iterate - visualize ] --- class:inverse, mline, center, middle # Tidy project structure --- # Project structure .center[] .footnote[ [Youtuber Rachelleea](https://www.youtube.com/channel/UCJCgaQzY5z5sA-xwkwibygw) ] --- # Project structure .pull-left[ <br> **Problem** - absolute paths - `rm(list = ls())` - non-reproducible results - total mess ] -- .pull-right[ <br> **Solution** - Rstudio projects - The `here` package - disenable workspace preservation ] --- # What is real? .center[<img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/7fa44a5471d40025344176ede4169c5ad3159482/1577f/screenshots/rstudio-workspace.png" alt="Disenable workspace preservation in Rstudio" width="550"/> ] --- # R projects .center[] --- # R projects .center[] --- # R projects .center[] --- # R projects .center[] --- # Folder structures .center[] --- # Here Package <!-- .center[] --> .center[<img src="data:image/png;base64,#https://raw.githubusercontent.com/allisonhorst/stats-illustrations/master/rstats-artwork/here.png" alt="Allison Horsts visualisation of the here function" width="650"/> ] --- # The Tidyverse .center[] --- class: inverse, center # The Tidyverse ### - collection of <span style = 'color:#E69F00'>easy-to-use</span> tools for data analysis and visualization ### - <span style = 'color:#E69F00'>consistent</span> in both syntax and output ### - <span style = 'color:#E69F00'>widely used</span> in the industry and in science <br> .center[<img src="data:image/png;base64,#https://peadarcoyle.com/wp-content/uploads/2019/01/hadley-wickham.jpg" alt="Picture of Hadley Wickham" width="500"/> ] --- background-image: url(data:image/png;base64,#https://raw.githubusercontent.com/tidyverse/tidyverse/master/man/figures/logo.png) background-position: 90% 10% ## `library(tidyverse)` will load ## the core tidyverse packages: #### [ggplot2](http://ggplot2.tidyverse.org), for data visualisation. #### [dplyr](http://dplyr.tidyverse.org), for data manipulation. #### [tidyr](http://tidyr.tidyverse.org), for data tidying. #### [readr](http://readr.tidyverse.org), for data import. #### [purrr](http://purrr.tidyverse.org), for functional programming. #### [tibble](http://tibble.tidyverse.org), for tibbles, a modern re-imagining of data frames. #### [stringr](https://github.com/tidyverse/stringr), for strings. #### [forcats](https://github.com/hadley/forcats), for factors. --- # Agenda ## - read in data with the `readr` package <br> ## - wrangle data with the `dplyr` package <br> ## - visualise data with the `ggplot2` package --- class:inverse, mline, center, middle # The *readr* package --- # readr Function | Reads -------------- | -------------------------- `read_csv()` | Comma separated values `read_csv2()` | Semi-colon separate values `read_delim()` | General delimited files `read_fwf()` | Fixed width files `read_log()` | Apache log files `read_table()` | Space separated files `read_tsv()` | Tab delimited values <br> .center[and many more ...] --- # readr ```r dfr <- read_csv("file_name.csv") ``` <br> -- <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=720px> </html> <br> ```r dfr <- read_csv(here("figures/file_name.csv")) ``` <br> -- <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=720px> </html> <br> ```r url <- 'https://paleobiodb.org/data1.2/occs/list.txt?base_name=Carnivora&show=full' dfr <- read_csv(file = url) ``` --- # readr ```r carnivores <- read_csv(file = here("2021_06_18/carnivores.csv")) ``` ``` ## Rows: 12168 Columns: 118 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (84): record_type, identified_name, identified_rank, difference, accepte... ## dbl (15): occurrence_no, reid_no, collection_no, identified_no, accepted_no,... ## lgl (19): flags, accepted_attr, plant_organ, plant_organ2, collection_subset... ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` ```r carnivores ``` ``` ## # A tibble: 12,168 × 118 ## occurrence_no record_type reid_no flags collection_no identified_name ## <dbl> <chr> <dbl> <lgl> <dbl> <chr> ## 1 117266 occ NA NA 9070 Cynodictis lacustris ## 2 137493 occ NA NA 11601 n. gen. Enaliarctos n.… ## 3 137495 occ NA NA 11601 Pinnarctidion bishopi ## 4 138737 occ NA NA 11798 Indarctos sinensis ## 5 138738 occ NA NA 11798 Protursus sp. ## 6 138739 occ NA NA 11798 Ursinae indet. ## 7 138740 occ NA NA 11798 Proputorius lufengensis ## 8 138741 occ NA NA 11798 Sivaonyx bathygnathus ## 9 138742 occ NA NA 11798 Lutra sp. ## 10 138743 occ NA NA 11798 Ictitherium gaudryi ## # … with 12,158 more rows, and 112 more variables: identified_rank <chr>, ## # identified_no <dbl>, difference <chr>, accepted_name <chr>, ## # accepted_attr <lgl>, accepted_rank <chr>, accepted_no <dbl>, ## # early_interval <chr>, late_interval <chr>, max_ma <dbl>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lng <dbl>, … ``` --- # Tibbles ## <span style = 'color:#E69F00'>data.frames</span> are the basic form of rectangular data in R (columns of variables, rows of observations) --- # Tibbles ## <span style = 'color:#E5E5E5'>data.frames are the basic form of rectangular data in R (columns of variables, rows of observations</span> ## `read_csv()` reads the data into a <span style = 'color:#E69F00'>tibble</span>, a modern version of the data frame --- # Tibbles ## <span style = 'color:#E5E5E5'>data.frames are the basic form of rectangular data in R (columns of variables, rows of observations</span> ## <span style = 'color:#E5E5E5'>read_csv() reads the data into a tibble, a modern version of the data frame.</span> ## a tibble <span style = 'color:#E69F00'>is</span> a data frame --- # Saving data Function | Writes ------------------- | ---------------------------------------- `write_csv()` | Comma separated values `write_excel_csv()` | CSV that you plan to open in Excel `write_delim()` | General delimited files `write_file()` | A single string, written as is `write_lines()` | A vector of strings, one string per line `write_tsv()` | Tab delimited values `write_rds()` | A data type used by R to save objects `write_sas()` | SAS .sas7bdat files `write_xpt()` | SAS transport format, .xpt `write_sav()` | SPSS .sav files `write_stata()` | Stata .dta files .center[ and many more... ] --- # Troubleshooting .center[] --- class:inverse, mline, center, middle # It's your turn --- # Exercise 1 - all necessary files can be found on the e-learning page - read through the [blog post by Malcolm Barret](https://malco.io/2018/11/05/why-should-i-use-the-here-package-when-i-m-already-using-projects/) - set up a R-project called *paleoecology_exercises* - in this project, create a folder called *R* - in this folder *R*, create a new R-script called *1_exercise.R* - in the project, create a folder called *data* - download the data file *example_dataset.csv* from e-learning - place *example_dataset.csv* in the *data* folder - use the *1_exercise.R* script, the **here** package, and the **readr** package to load *example_dataset.csv* into R - in the project, create a folder called *output* - save the data in this *output* folder as *new_dataset.csv* - celebrate --- class:inverse, mline, center, middle # The *dplyr* package --- # The main verbs of *dplyr* ## `select()` ## `filter()` ## `mutate()` ## `arrange()` ## `summarize()` ## `group_by()` --- # The main verbs of *dplyr* ## <span style = 'color:#E69F00'><code>select()</code></span> = <span style = 'color:#56B4E9'>Subset columns (variables)</span> ## `filter()` ## `mutate()` ## `arrange()` ## `summarize()` ## `group_by()` --- # dplyr ## `select()` ```r select(<DATA>, <VARIABLES>) ``` --- # dplyr ## `select()` ```r select(<DATA>, <VARIABLES>) ``` ```r carnivores ``` ``` ## # A tibble: 12,168 × 118 ## occurrence_no record_type reid_no flags collection_no identified_name ## <dbl> <chr> <dbl> <lgl> <dbl> <chr> ## 1 117266 occ NA NA 9070 Cynodictis lacustris ## 2 137493 occ NA NA 11601 n. gen. Enaliarctos n.… ## 3 137495 occ NA NA 11601 Pinnarctidion bishopi ## 4 138737 occ NA NA 11798 Indarctos sinensis ## 5 138738 occ NA NA 11798 Protursus sp. ## 6 138739 occ NA NA 11798 Ursinae indet. ## 7 138740 occ NA NA 11798 Proputorius lufengensis ## 8 138741 occ NA NA 11798 Sivaonyx bathygnathus ## 9 138742 occ NA NA 11798 Lutra sp. ## 10 138743 occ NA NA 11798 Ictitherium gaudryi ## # … with 12,158 more rows, and 112 more variables: identified_rank <chr>, ## # identified_no <dbl>, difference <chr>, accepted_name <chr>, ## # accepted_attr <lgl>, accepted_rank <chr>, accepted_no <dbl>, ## # early_interval <chr>, late_interval <chr>, max_ma <dbl>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lng <dbl>, … ``` --- # dplyr ## `select()` ```r select(carnivores, identified_rank, accepted_name, min_ma, max_ma) ``` ``` ## # A tibble: 12,168 × 4 ## identified_rank accepted_name min_ma max_ma ## <chr> <chr> <dbl> <dbl> ## 1 species Cynodictis lacustris 33.9 37.2 ## 2 species Enaliarctos mealsi 23.0 28.1 ## 3 species Pinnarctidion bishopi 23.0 28.1 ## 4 species Indarctos atticus 7.25 11.6 ## 5 genus Protursus 7.25 11.6 ## 6 subfamily Ursinae 7.25 11.6 ## 7 species Proputorius 7.25 11.6 ## 8 species Sivaonyx bathygnathus 7.25 11.6 ## 9 genus Lutra 7.25 11.6 ## 10 species Ictitherium 7.25 11.6 ## # … with 12,158 more rows ``` --- # dplyr ## `select()` ```r select(carnivores, occurrence_no, record_type, reid_no, flags) select(carnivores, occurrence_no:flags) select(carnivores, 1:4) select(carnivores, starts_with("c")) ?select_helpers ``` --- # dplyr ## `select()` ## <span style = 'color:#E69F00'><code>filter()</code></span> = <span style = 'color:#56B4E9'>Subset rows by value</span> ## `mutate()` ## `arrange()` ## `summarize()` ## `group_by()` --- # dplyr ## `filter()` ```r filter(<DATA>, <PREDICATES>) ``` ### Predicates: `TRUE/FALSE` statements -- ### Comparisons: `>`, `>=`, `<`, `<=`, `!=` (not equal), and `==` (equal). -- ### Operators: `&` is "and", `|` is "or", and `!` is "not" -- ### `%in%` --- # dplyr ## `filter()` ```r filter(carnivores, accepted_rank == "species", max_ma > 7) ``` ``` ## # A tibble: 3,235 × 118 ## occurrence_no record_type reid_no flags collection_no identified_name ## <dbl> <chr> <dbl> <lgl> <dbl> <chr> ## 1 117266 occ NA NA 9070 Cynodictis lacustris ## 2 137493 occ NA NA 11601 n. gen. Enaliarctos n.… ## 3 137495 occ NA NA 11601 Pinnarctidion bishopi ## 4 138737 occ NA NA 11798 Indarctos sinensis ## 5 138741 occ NA NA 11798 Sivaonyx bathygnathus ## 6 147936 occ NA NA 13061 n. gen. Atopotarus n. … ## 7 149438 occ NA NA 13192 Allodesmus kernensis ## 8 164284 occ NA NA 14768 Ravenictis n. sp. krau… ## 9 165362 occ NA NA 14906 Ictidopappus n. sp. mu… ## 10 165688 occ NA NA 14939 Pristinictis n. sp. co… ## # … with 3,225 more rows, and 112 more variables: identified_rank <chr>, ## # identified_no <dbl>, difference <chr>, accepted_name <chr>, ## # accepted_attr <lgl>, accepted_rank <chr>, accepted_no <dbl>, ## # early_interval <chr>, late_interval <chr>, max_ma <dbl>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lng <dbl>, … ``` --- # The main verbs of *dplyr* ## `select()` ## `filter()` ## <span style = 'color:#E69F00'><code>mutate()</code></span> = <span style = 'color:#56B4E9'>Change or add a variable</span> ## `arrange()` ## `summarize()` ## `group_by()` --- # dplyr ## `mutate()` ```r mutate(<DATA>, <NAME> = <FUNCTION>) ``` --- # dplyr ## `mutate()` ```r mutate(carnivores, age_range = abs(max_ma - min_ma)) ``` ``` ## # A tibble: 12,168 × 119 ## occurrence_no record_type reid_no flags collection_no identified_name ## <dbl> <chr> <dbl> <lgl> <dbl> <chr> ## 1 117266 occ NA NA 9070 Cynodictis lacustris ## 2 137493 occ NA NA 11601 n. gen. Enaliarctos n.… ## 3 137495 occ NA NA 11601 Pinnarctidion bishopi ## 4 138737 occ NA NA 11798 Indarctos sinensis ## 5 138738 occ NA NA 11798 Protursus sp. ## 6 138739 occ NA NA 11798 Ursinae indet. ## 7 138740 occ NA NA 11798 Proputorius lufengensis ## 8 138741 occ NA NA 11798 Sivaonyx bathygnathus ## 9 138742 occ NA NA 11798 Lutra sp. ## 10 138743 occ NA NA 11798 Ictitherium gaudryi ## # … with 12,158 more rows, and 113 more variables: identified_rank <chr>, ## # identified_no <dbl>, difference <chr>, accepted_name <chr>, ## # accepted_attr <lgl>, accepted_rank <chr>, accepted_no <dbl>, ## # early_interval <chr>, late_interval <chr>, max_ma <dbl>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lng <dbl>, … ``` --- # dplyr ## `transmute()` ```r transmute(carnivores, age_range = abs(max_ma - min_ma), age_range_sq = age_range^2) ``` ``` ## # A tibble: 12,168 × 2 ## age_range age_range_sq ## <dbl> <dbl> ## 1 3.30 10.9 ## 2 5.07 25.7 ## 3 5.07 25.7 ## 4 4.37 19.1 ## 5 4.37 19.1 ## 6 4.37 19.1 ## 7 4.37 19.1 ## 8 4.37 19.1 ## 9 4.37 19.1 ## 10 4.37 19.1 ## # … with 12,158 more rows ``` --- # The main verbs of *dplyr* ## `select()` ## `filter()` ## `mutate()` ## <span style = 'color:#E69F00'><code>arrange()</code></span> = <span style = 'color:#56B4E9'>Sort the data set</span> ## `summarize()` ## `group_by()` --- # dplyr ## `arrange()` ```r arrange(<DATA>, <SORTING VARIABLE>) ``` --- # dplyr ## `arrange()` ```r arrange(carnivores, max_ma) %>% select(max_ma, everything()) ``` ``` ## # A tibble: 12,168 × 118 ## max_ma occurrence_no record_type reid_no flags collection_no identified_name ## <dbl> <dbl> <chr> <dbl> <lgl> <dbl> <chr> ## 1 0.0117 154618 occ NA NA 13738 Panthera tigris ## 2 0.0117 212868 occ NA NA 21686 Felis pardalis ## 3 0.0117 212869 occ NA NA 21686 Felis yagouarou… ## 4 0.0117 212870 occ NA NA 21686 Conepatus semis… ## 5 0.0117 212871 occ NA NA 21686 Eira barbara ## 6 0.0117 212872 occ NA NA 21686 Galictis vittata ## 7 0.0117 212873 occ NA NA 21686 Procyon cancriv… ## 8 0.0117 212918 occ NA NA 21686 Cerdocyon thous ## 9 0.0117 212932 occ NA NA 21687 Speothos venati… ## 10 0.0117 212933 occ NA NA 21687 Eira barbara ## # … with 12,158 more rows, and 111 more variables: identified_rank <chr>, ## # identified_no <dbl>, difference <chr>, accepted_name <chr>, ## # accepted_attr <lgl>, accepted_rank <chr>, accepted_no <dbl>, ## # early_interval <chr>, late_interval <chr>, min_ma <dbl>, ref_author <chr>, ## # ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, class <chr>, ## # order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lng <dbl>, … ``` --- # dplyr ## `arrange()` ```r arrange(carnivores, max_ma, lng) %>% select(max_ma, lng, everything()) ``` ``` ## # A tibble: 12,168 × 118 ## max_ma lng occurrence_no record_type reid_no flags collection_no ## <dbl> <dbl> <dbl> <chr> <dbl> <lgl> <dbl> ## 1 0.0117 -177. 653974 occ NA NA 70586 ## 2 0.0117 -177. 653975 occ NA NA 70586 ## 3 0.0117 -177. 653976 occ NA NA 70586 ## 4 0.0117 -176. 1447238 occ NA NA 201978 ## 5 0.0117 -173. 819689 occ NA NA 90098 ## 6 0.0117 -173. 819690 occ NA NA 90098 ## 7 0.0117 -170. 1310863 occ NA NA 175953 ## 8 0.0117 -170. 1310864 occ NA NA 175953 ## 9 0.0117 -170. 1310865 occ NA NA 175953 ## 10 0.0117 -170. 1310851 occ NA NA 175951 ## # … with 12,158 more rows, and 111 more variables: identified_name <chr>, ## # identified_rank <chr>, identified_no <dbl>, difference <chr>, ## # accepted_name <chr>, accepted_attr <lgl>, accepted_rank <chr>, ## # accepted_no <dbl>, early_interval <chr>, late_interval <chr>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lat <dbl>, … ``` --- # dplyr ## `desc()` ```r arrange(carnivores, max_ma, desc(lng)) %>% select(max_ma, lng, everything()) ``` ``` ## # A tibble: 12,168 × 118 ## max_ma lng occurrence_no record_type reid_no flags collection_no ## <dbl> <dbl> <dbl> <chr> <dbl> <lgl> <dbl> ## 1 0.0117 180. 1429298 occ NA NA 198860 ## 2 0.0117 180. 1268080 occ NA NA 168608 ## 3 0.0117 178. 1447237 occ NA NA 201977 ## 4 0.0117 177. 653978 occ NA NA 70585 ## 5 0.0117 177. 805370 occ NA NA 87900 ## 6 0.0117 176. 1318554 occ NA NA 177178 ## 7 0.0117 176. 1429299 occ NA NA 198861 ## 8 0.0117 176. 1447239 occ NA NA 201979 ## 9 0.0117 175. 1268186 occ NA NA 168679 ## 10 0.0117 175. 1447240 occ NA NA 201980 ## # … with 12,158 more rows, and 111 more variables: identified_name <chr>, ## # identified_rank <chr>, identified_no <dbl>, difference <chr>, ## # accepted_name <chr>, accepted_attr <lgl>, accepted_rank <chr>, ## # accepted_no <dbl>, early_interval <chr>, late_interval <chr>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lat <dbl>, … ``` --- class:inverse, mline, center, middle # The pipe <br> ## Passes the result on one function to another function --- # The Pipe ```r carnivores1 <- arrange(carnivores, max_ma) carnivores2 <- filter(carnivores, max_ma > 7) carnivores3 <- mutate(carnivores2, age_range = abs(max_ma - min_ma)) carnivores3 ``` -- <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=720px> </html> ```r mutate( filter( arrange(carnivores, max_ma), max_ma > 7 ), age_range = abs(max_ma - min_ma) ) ``` -- <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=720px> </html> <br> ```r arrange(carnivores, max_ma) %>% filter(max_ma > 7) %>% mutate(age_range = abs(max_ma - min_ma)) ``` --- # The Pipe ## Insert with **`ctrl/cmd + shift + m`** --- # The main verbs of *dplyr* ## `select()` ## `filter()` ## `mutate()` ## `arrange()` ## <span style = 'color:#E69F00'><code>summarize()</code></span> = <span style = 'color:#56B4E9'>Summarize the data</span> ## <span style = 'color:#E69F00'><code>group_by()</code></span> = <span style = 'color:#56B4E9'>Group the data</span> --- # dplyr ## `summarize()` ```r summarize(<DATA>, <NAME> = <FUNCTION>) ``` --- # dplyr ## `summarize()` ```r summarize(carnivores, mean_fad = mean(max_ma)) ``` ``` ## # A tibble: 1 × 1 ## mean_fad ## <dbl> ## 1 10.8 ``` -- <html> <div style='float:left'></div> <hr color='#EB811B' size=1px width=720px> </html> <br> ```r summarize(carnivores, mean_fad = mean(max_ma), sd_fad = sd(max_ma)) ``` ``` ## # A tibble: 1 × 2 ## mean_fad sd_fad ## <dbl> <dbl> ## 1 10.8 12.7 ``` --- # dplyr ## `group_by()` ```r group_by(<DATA>, <VARIABLE>) ``` --- # dplyr ## `group_by()` ```r carnivores %>% group_by(accepted_rank) ``` ``` ## # A tibble: 12,168 × 118 ## # Groups: accepted_rank [12] ## occurrence_no record_type reid_no flags collection_no identified_name ## <dbl> <chr> <dbl> <lgl> <dbl> <chr> ## 1 117266 occ NA NA 9070 Cynodictis lacustris ## 2 137493 occ NA NA 11601 n. gen. Enaliarctos n.… ## 3 137495 occ NA NA 11601 Pinnarctidion bishopi ## 4 138737 occ NA NA 11798 Indarctos sinensis ## 5 138738 occ NA NA 11798 Protursus sp. ## 6 138739 occ NA NA 11798 Ursinae indet. ## 7 138740 occ NA NA 11798 Proputorius lufengensis ## 8 138741 occ NA NA 11798 Sivaonyx bathygnathus ## 9 138742 occ NA NA 11798 Lutra sp. ## 10 138743 occ NA NA 11798 Ictitherium gaudryi ## # … with 12,158 more rows, and 112 more variables: identified_rank <chr>, ## # identified_no <dbl>, difference <chr>, accepted_name <chr>, ## # accepted_attr <lgl>, accepted_rank <chr>, accepted_no <dbl>, ## # early_interval <chr>, late_interval <chr>, max_ma <dbl>, min_ma <dbl>, ## # ref_author <chr>, ref_pubyr <dbl>, reference_no <dbl>, phylum <chr>, ## # class <chr>, order <chr>, family <chr>, genus <chr>, plant_organ <lgl>, ## # plant_organ2 <lgl>, abund_value <dbl>, abund_unit <chr>, lng <dbl>, … ``` --- # dplyr ## `group_by()` ```r carnivores %>% group_by(accepted_rank) %>% summarise(n = n(), mean_fad = mean(max_ma)) ``` ``` ## # A tibble: 12 × 3 ## accepted_rank n mean_fad ## <chr> <int> <dbl> ## 1 family 913 13.4 ## 2 genus 3060 10.4 ## 3 infraorder 7 20.5 ## 4 order 172 20.8 ## 5 species 7637 10.4 ## 6 subfamily 277 13.1 ## 7 subgenus 5 2.28 ## 8 suborder 8 34.5 ## 9 subspecies 27 0.467 ## 10 superfamily 8 23.0 ## 11 tribe 26 7.87 ## 12 unranked clade 28 15.0 ``` --- # dplyr ## `group_by()` ```r carnivores %>% group_by(accepted_rank) %>% mutate(n = n(), mean_fad = mean(max_ma)) %>% select(n, mean_fad) ``` ``` ## Adding missing grouping variables: `accepted_rank` ``` ``` ## # A tibble: 12,168 × 3 ## # Groups: accepted_rank [12] ## accepted_rank n mean_fad ## <chr> <int> <dbl> ## 1 species 7637 10.4 ## 2 species 7637 10.4 ## 3 species 7637 10.4 ## 4 species 7637 10.4 ## 5 genus 3060 10.4 ## 6 subfamily 277 13.1 ## 7 genus 3060 10.4 ## 8 species 7637 10.4 ## 9 genus 3060 10.4 ## 10 genus 3060 10.4 ## # … with 12,158 more rows ``` --- class: inverse, mline, middle, center # Joins --- # dplyr ## Joining data ### Use `left_join()`, `right_join()`, `full_join()`, or `inner_join()` to join datasets ### Use `semi_join()` or `anti_join()` to filter datasets against each other --- # Joins .center[  ] --- # Joins .center[  ] --- # Joins .center[  ] --- # Joins .center[  ] --- # Joins .center[  ] --- # Joins .center[  ] --- # Reading list #### [Here package](https://github.com/jennybc/here_here), intro to the `here` package. #### [Happy git with R](https://happygitwithr.com/), how to use version control with Git/ GitHub as an R user. #### [R for Data Science](https://r4ds.had.co.nz/index.html), best intro to the tidyverse. #### [Project-oriented workflow](https://www.tidyverse.org/blog/2017/12/workflow-vs-script/), good working habits by Jenny Bryan. #### [Modern Dive](https://moderndive.com/foreword.html), statistical inference via the tidyverse. #### [Cheatsheets](https://www.rstudio.com/resources/cheatsheets/), a compilation of cheatsheets for various packages. --- class:inverse, mline, center, middle # It's your turn --- # Exercise 2 - read the [Introduction to ggplot2 (Part 1)](https://gregor-mathes.netlify.app/2020/06/29/introduction-to-ggplot2-part-1/) - follow along with the exercises at `exercise_ggplot.R` .center[<img src="data:image/png;base64,#https://ak2.picdn.net/shutterstock/videos/15323182/thumb/8.jpg" alt="Allison Horsts visualisation of the here function" width="650"/> ] --- # Exercise 3 - this is the final exercise for today - perform exploratory data analysis on the *carnivores.csv* data set - use R-projects and the **here** package to structure this exercise - use the **dplyr** functions to get insights from the *carnivores.csv* dataset - visualise your insights via the **ggplot** package - upload the final visualisation to the e-learning exercise 3